1 Introduction

This document presents the data processing and full analysis of syllable duration, intensity, and F0 across different languages (Kazakh, Russian, and Code-Switching) using R.

Linguistic Phenomenon: Kazakh-Russian intra-word code-switching: [[Russian Noun] + [Kazakh Noun suffix]]

Working RQ: How the stress patterns of two languages interact in word-internal shifts: whether the addition of Kazakh suffixes to Russian noun stems affects (shifts) the stress pattern (to the last syllable), consistent with Kazakh phonology.

Working predictions:

  1. Stress remains fixed on the root: Russian words maintain the same stress pattern as their unsuffixed forms. Stress is determined by the root and does not shift when Kazakh suffixes are added.
  • If this prediction is true:
    • no significant diff in s1:s2 (reka) of unsuffixed vs s1:s2. (rekalar) of suffixed CS tokens.
    • significant diff in s1:s2:s3 (suffixed CS tokens:rekalar) in favor of stress fixed on the root.
    • significant diff in s3(CS):s3(Kazakh) in favor of Kazakh.
  1. Stress follows Kazakh rules: Russian words are treated like Kazakh words when suffixed, meaning stress is assigned according to Kazakh stress rules, likely resulting in final-syllable stress.
  • If this prediction is true:
    • significant difference in in s1:s2 (reka) of unsuffixed vs s1:s2. (rekalar) of suffixed CS tokens, with suffixed closer to 1.
    • sig diff in s1:s2:s3 (suffixed CS tokens:rekalar) in favor of s3.
    • no significant diff in s3(CS):s3(Kazakh) in favor of Kazakh.
  1. A mix of Russian and Kazakh stress: Russian words exhibit characteristics of both languages. The original Russian stress location may remain, but an additional Kazakh-style final stress may also emerge.
  • If this prediction is true:
    • no significant diff in in s1:s2 (reka) of unsuffixed vs s1:s2 (rekalar) of suffixed CS tokens.
    • no significant diff in s3(CS):s3(Kazakh).
  1. Stress follows Russian suffixation rules: Russian words behave as though they have Russian suffixes, meaning stress is assigned based on Russian stress patterns for the entire word. Some words may shift stress, while others retain their original placement.
  • If this prediction is true:
      1. Russian roots with mobile stress:
      • significant diff in s1:s2 (gorod) of unsuffixed vs s1:s2. (gorodtar) of suffixed CS tokens.
      • significant diff in s1:s2:s3 (suffixed CS tokens:gorodtar) in favor of s3.
      1. for immobile roots:
      • no significant diff in s1:s2 (kniga) of unsuffixed vs s1:s2. (knigalar) of suffixed CS tokens.
      • significant diff in s1:s2:s3 (suffixed CS tokens:knigalar) in favor of root stress (s1, s2).

2 Data Aggregation

2.1 Dataset Loading and Restructuring

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Rows: 1976 Columns: 29
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (17): Filename, Word, Annotation, MaxF0Hz, MinF0Hz, MeanF0, Centre_MeanF...
## dbl (12): Word_beg, Word_end, Word_dur_ms, Begin, End, Duration_in_ms, Max_d...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Expected 4 pieces. Additional pieces discarded in 60 rows [19, 42, 54, 96, 97,
## 126, 129, 147, 167, 254, 268, 284, 296, 304, 314, 341, 376, 478, 507, 510,
## ...].

2.2 Normalize Continuous Values (z-score)

## Warning: There were 3 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `MeanF0 = as.numeric(MeanF0)`.
## Caused by warning:
## ! NAs introduced by coercion
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 2 remaining warnings.
## # A tibble: 1,976 × 43
##    Filename   Speaker Gender Word  Word_beg Word_end Word_dur_ms SyllPos SyllStr
##    <chr>      <chr>   <chr>  <chr>    <dbl>    <dbl>       <dbl> <chr>   <chr>  
##  1 Speaker_1… Speake… male   вече…     1.89     2.50        610. s1      cv     
##  2 Speaker_1… Speake… male   вече…     1.89     2.50        610. s2      cv     
##  3 Speaker_1… Speake… male   вече…     1.89     2.50        610. s3      cv     
##  4 Speaker_1… Speake… male   олжа…     8.48     9.16        677. s1      vc     
##  5 Speaker_1… Speake… male   олжа…     8.48     9.16        677. s2      cv     
##  6 Speaker_1… Speake… male   олжа…     8.48     9.16        677. s3      cvc    
##  7 Speaker_1… Speake… male   шеке…    13.5     14.2         642. s1      cv     
##  8 Speaker_1… Speake… male   шеке…    13.5     14.2         642. s2      cv     
##  9 Speaker_1… Speake… male   шеке…    13.5     14.2         642. s3      cvc    
## 10 Speaker_1… Speake… male   сөре…    18.8     19.4         628. s1      cv     
## # ℹ 1,966 more rows
## # ℹ 34 more variables: SyllIPA <chr>, Stress <chr>, Begin <dbl>, End <dbl>,
## #   Duration_in_ms <dbl>, Max_dB <dbl>, Min_dB <dbl>, Mean_dB <dbl>,
## #   Centre_mean_dB <dbl>, MaxF0Hz <dbl>, MinF0Hz <dbl>, MeanF0 <dbl>,
## #   Centre_MeanF0 <chr>, Language <chr>, SuffixCase <chr>, WordForm <chr>,
## #   LatinScript <chr>, Gloss <chr>, WordClass <chr>, StressedSyll <dbl>,
## #   NounGender <chr>, Declension <dbl>, StressShift <chr>, ShiftDirect <chr>, …

3 Descriptive Statistics

3.1 Syllable Duration

OBSERVATION:

3.2 Syllable Intensity

OBSERVATION:

3.3 Pitch Range

3.4 Fundamental Frequency

OBSERVATION:

3.5 Syllable Duration by Language and Syllable Shape

OBSERVATION:

3.6 Syllable Duration by Suffix case

## Syllable Duration

3.7 Kazakh tokens: Compare syllable durations by inflection status

## CS tokens: Syllable duration by stress

### CS tokens: Stress & WordForm interaction ### Compare s3 duration for Kazakh vs CS tokens

3.7.1 Russian tokens: Syllable duration by stress

4 Statistical Analyses

4.1 Fitting a linear model

## 
## Call:
## lm(formula = Duration_in_ms ~ SyllPos + Language, data = df_full_sample)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -190.22  -53.50   -6.03   45.21  606.30 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  216.307      3.340  64.754  < 2e-16 ***
## SyllPoss2     21.983      3.735   5.885 4.67e-09 ***
## SyllPoss3     15.095      4.667   3.234  0.00124 ** 
## LanguageRus   26.791      4.716   5.681 1.54e-08 ***
## LanguageCS    11.259      3.721   3.026  0.00251 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 74.26 on 1949 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.03366,    Adjusted R-squared:  0.03168 
## F-statistic: 16.97 on 4 and 1949 DF,  p-value: 1.088e-13

OBSERVATION:

4.2 Fitting a linear mixed-effects model

## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## 
## Attaching package: 'lmerTest'
## The following object is masked from 'package:lme4':
## 
##     lmer
## The following object is masked from 'package:stats':
## 
##     step

5 Checking Prior Assumptions (Reports)

5.1 Kazakh: is stress default final?

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_col()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_col()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_col()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_col()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_col()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_col()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_col()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_col()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_col()`).

5.2 Kazakh: Reliable correlates of stress

roots_kaz <- kz_all_syll %>%
  filter(WordForm == "uninflected")

suffixed_kaz <- kz_all_syll %>%
  filter(WordForm == "inflected")

model_roots_dur <- lmer(Duration_in_ms ~ SyllPos + (1 | Speaker) + (1 | Word), data = roots_kaz)
summary(model_roots_dur)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Duration_in_ms ~ SyllPos + (1 | Speaker) + (1 | Word)
##    Data: roots_kaz
## 
## REML criterion at convergence: 3535.1
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.75448 -0.68442 -0.08159  0.60448  3.09951 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Word     (Intercept)  276.2   16.62   
##  Speaker  (Intercept)  522.0   22.85   
##  Residual             3508.5   59.23   
## Number of obs: 320, groups:  Word, 40; Speaker, 4
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)  226.529     12.623   3.801  17.946 8.15e-05 ***
## SyllPoss2     52.619      6.622 276.000   7.946 4.94e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr)
## SyllPoss2 -0.262
model_suffixed_dur <- lmer(Duration_in_ms ~ SyllPos + (1 | Speaker) + (1 | Word), data = suffixed_kaz)
summary(model_suffixed_dur)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Duration_in_ms ~ SyllPos + (1 | Speaker) + (1 | Word)
##    Data: suffixed_kaz
## 
## REML criterion at convergence: 5155.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.7199 -0.6445 -0.0443  0.6274  3.5197 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Word     (Intercept)  113.1   10.63   
##  Speaker  (Intercept)  371.2   19.27   
##  Residual             3068.7   55.40   
## Number of obs: 474, groups:  Word, 40; Speaker, 4
## 
## Fixed effects:
##             Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept) 195.9828    10.7268   4.0289  18.270 5.01e-05 ***
## SyllPoss2    -0.4993     6.2142 429.0131  -0.080    0.936    
## SyllPoss3    47.6350     6.2560 430.5744   7.614 1.69e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) SyllP2
## SyllPoss2 -0.292       
## SyllPoss3 -0.290  0.500
model_roots_int <- lmer(Mean_dB ~ SyllPos + (1 | Speaker) + (1 | Word), data = roots_kaz)
summary(model_roots_int)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Mean_dB ~ SyllPos + (1 | Speaker) + (1 | Word)
##    Data: roots_kaz
## 
## REML criterion at convergence: 1738.5
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.7280 -0.6694  0.1156  0.6561  2.3317 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Word     (Intercept)  3.367   1.835   
##  Speaker  (Intercept) 18.763   4.332   
##  Residual             11.011   3.318   
## Number of obs: 320, groups:  Word, 40; Speaker, 4
## 
## Fixed effects:
##             Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)  61.8897     2.2008   3.1528  28.121 6.83e-05 ***
## SyllPoss2    -0.1264     0.3710 276.0001  -0.341    0.734    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr)
## SyllPoss2 -0.084
model_suffixed_int <- lmer(Mean_dB ~ SyllPos + (1 | Speaker) + (1 | Word), data = suffixed_kaz)
summary(model_suffixed_int)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Mean_dB ~ SyllPos + (1 | Speaker) + (1 | Word)
##    Data: suffixed_kaz
## 
## REML criterion at convergence: 2566.9
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2422 -0.6139  0.1288  0.6423  2.5559 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Word     (Intercept)  5.631   2.373   
##  Speaker  (Intercept)  3.677   1.918   
##  Residual             10.957   3.310   
## Number of obs: 474, groups:  Word, 40; Speaker, 4
## 
## Fixed effects:
##             Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)  68.3941     1.0628   4.2990  64.353  1.4e-07 ***
## SyllPoss2     1.0379     0.3714 429.0521   2.794  0.00544 ** 
## SyllPoss3     0.2492     0.3743 429.4918   0.666  0.50594    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) SyllP2
## SyllPoss2 -0.176       
## SyllPoss3 -0.175  0.499
model_roots_f0 <- lmer(MeanF0 ~ SyllPos + (1 | Speaker) + (1 | Word), data = roots_kaz)
summary(model_roots_f0)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: MeanF0 ~ SyllPos + (1 | Speaker) + (1 | Word)
##    Data: roots_kaz
## 
## REML criterion at convergence: 2611
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.9693 -0.6668  0.0422  0.5273  3.2964 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Word     (Intercept)    4.152  2.038  
##  Speaker  (Intercept) 4772.994 69.087  
##  Residual              200.816 14.171  
## Number of obs: 318, groups:  Word, 40; Speaker, 4
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)  162.760     34.563   3.004   4.709   0.0181 *  
## SyllPoss2    -14.998      1.590 273.970  -9.435   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr)
## SyllPoss2 -0.023
model_suffixed_f0 <- lmer(MeanF0 ~ SyllPos + (1 | Speaker) + (1 | Word), data = suffixed_kaz)
summary(model_suffixed_f0)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: MeanF0 ~ SyllPos + (1 | Speaker) + (1 | Word)
##    Data: suffixed_kaz
## 
## REML criterion at convergence: 3634.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.7271 -0.4804 -0.0797  0.5228  2.7992 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Word     (Intercept)   13.77   3.711  
##  Speaker  (Intercept) 4511.56  67.168  
##  Residual              133.27  11.544  
## Number of obs: 464, groups:  Word, 40; Speaker, 4
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)  157.017     33.602   3.005   4.673   0.0184 *  
## SyllPoss2     -1.396      1.320 423.975  -1.058   0.2907    
## SyllPoss3     11.264      1.322 424.756   8.518 2.84e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) SyllP2
## SyllPoss2 -0.020       
## SyllPoss3 -0.020  0.510

OBSERVATION: Kazakh Stress and its Correlates

  • Reference level - s1: initial syllable in both cases.
Dataset Acoustic Measure SyllPos Effect Estimate t-value p-value Significance
Roots Duration s2 +52.62 7.95 < 0.001 ***
Mean_dB s2 -0.13 -0.34 0.734 n.s.
MeanF0 s2 -15.00 -9.44 < 0.001 ***
Suffixed Duration s2 -0.50 -0.08 0.936 n.s.
Duration s3 +47.64 7.61 < 0.001 ***
Mean_dB s2 +1.04 2.79 0.005 **
Mean_dB s3 +0.25 0.67 0.506 n.s.
MeanF0 s2 -1.40 -1.06 0.291 n.s.
MeanF0 s3 +11.26 8.52 < 0.001 ***
  • In uninflected roots, stress appears to fall on the second syllable as shown by increased duration and lower pitch. Does pitch play an edge marking role as in Uyghur?

  • In inflected forms, stress appears to shift to the suffix, reflected in longer duration and elevated F0 (Why?) in the final syllable (s3).

  • Intensity (Mean_dB) is not a consistent cue across word types and positions, aligning with previous findings that duration is more robust stress correlate in Kazakh and the role of pitch needs to be re-assessed.

5.3 Russian: correlates of stress

# df_rus
# Filter Russian syllables with valid Stress
rus_all_syll <- df_full_sample %>%
  filter(Language == "Rus", !is.na(Stress))

# Summarize by stress and word form
summary_rus_all <- rus_all_syll %>%
  group_by(Stress, WordForm) %>%
  summarise(
    mean_dur = mean(Duration_in_ms, na.rm = TRUE),
    sd_dur = sd(Duration_in_ms, na.rm = TRUE),
    
    mean_dB = mean(Mean_dB, na.rm = TRUE),
    sd_dB = sd(Mean_dB, na.rm = TRUE),
    
    mean_f0 = mean(MeanF0, na.rm = TRUE),
    sd_f0 = sd(MeanF0, na.rm = TRUE),
    
    n = n(),
    .groups = "drop"
  ) %>%
  mutate(
    se_dur = sd_dur / sqrt(n),
    se_dB = sd_dB / sqrt(n),
    se_f0 = sd_f0 / sqrt(n)
  )

# Prevent NA level from sneaking in
summary_rus_all_complete <- summary_rus_all %>%
  mutate(Stress = as.character(Stress)) %>%
  complete(Stress, WordForm, fill = list(
    mean_dur = NA,
    se_dur = NA,
    mean_dB = NA,
    se_dB = NA,
    mean_f0 = NA,
    se_f0 = NA
  )) %>%
  filter(!is.na(Stress)) %>%
  mutate(Stress = factor(Stress, levels = c("stressed", "unstressed")))  # Explicit order


# Duration Plot
rus_dur <- ggplot(summary_rus_all_complete, aes(x = Stress, y = mean_dur, fill = WordForm)) +
  geom_col(position = position_dodge(width = 0.8), width = 0.7) +
  geom_errorbar(
    aes(ymin = mean_dur - se_dur, ymax = mean_dur + se_dur),
    position = position_dodge(width = 0.8),
    width = 0.2
  ) +
  labs(
    x = "Stressed Syllable",
    y = "Mean Duration (ms)",
    fill = "Word Form"
  ) +
  theme_minimal(base_size = 14) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))


ggsave("rus_duration_plot.png", plot = rus_dur, width = 8, height = 6, dpi = 300, bg = "white")

# Intensity Plot
rus_db <- ggplot(summary_rus_all_complete, aes(x = Stress, y = mean_dB, fill = WordForm)) +
  geom_col(position = position_dodge(width = 0.8), width = 0.7) +
  geom_errorbar(
    aes(ymin = mean_dB - se_dB, ymax = mean_dB + se_dB),
    position = position_dodge(width = 0.8),
    width = 0.2
  ) +
  labs(
    x = "Stressed Syllable",
    y = "Mean Intensity (dB)",
    fill = "Word Form"
  ) +
  theme_minimal(base_size = 14) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))


ggsave("rus_intensity_plot.png", plot = rus_db, width = 8, height = 6, dpi = 300, bg = "white")

# F0 Plot
rus_f0 <- ggplot(summary_rus_all_complete, aes(x = Stress, y = mean_f0, fill = WordForm)) +
  geom_col(position = position_dodge(width = 0.8), width = 0.7) +
  geom_errorbar(
    aes(ymin = mean_f0 - se_f0, ymax = mean_f0 + se_f0),
    position = position_dodge(width = 0.8),
    width = 0.2
  ) +
  labs(
    x = "Stressed Syllable",
    y = "Mean F0 (Hz)",
    fill = "Word Form"
  ) +
  theme_minimal(base_size = 14) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggsave("rus_f0_plot.png", plot = rus_f0, width = 8, height = 6, dpi = 300, bg = "white")

# Horizontal panel with labels A, B, C
rus_dur_clean <- rus_dur + theme(axis.title.x = element_blank())
rus_f0_clean <- rus_f0 + theme(axis.title.x = element_blank())

rus_panel_horizontal <- rus_dur_clean + rus_db + rus_f0_clean +
  plot_layout(ncol = 3, guides = "collect") +
  plot_annotation(tag_levels = 'A')

ggsave("rus_panel_horizontal.png", plot = rus_panel_horizontal, width = 12, height = 5, dpi = 300, bg = "white")

rus_panel_horizontal 

# Filter Russian tokens with valid syllable position
# Filter valid Russian tokens with syllable position and stress info
rus_all_syll_posstress <- df_full_sample %>%
  filter(Language == "Rus", !is.na(SyllPos), !is.na(Stress))

# Summarize by syllable position and stress
summary_rus_posstress <- rus_all_syll_posstress %>%
  group_by(SyllPos, Stress) %>%
  summarise(
    mean_dur = mean(Duration_in_ms, na.rm = TRUE),
    sd_dur = sd(Duration_in_ms, na.rm = TRUE),

    mean_dB = mean(Mean_dB, na.rm = TRUE),
    sd_dB = sd(Mean_dB, na.rm = TRUE),

    mean_f0 = mean(MeanF0, na.rm = TRUE),
    sd_f0 = sd(MeanF0, na.rm = TRUE),

    n = n(),
    .groups = "drop"
  ) %>%
  mutate(
    se_dur = sd_dur / sqrt(n),
    se_dB = sd_dB / sqrt(n),
    se_f0 = sd_f0 / sqrt(n)
  )

# Ensure complete combinations and order factors
summary_rus_posstress_complete <- summary_rus_posstress %>%
  complete(SyllPos, Stress, fill = list(
    mean_dur = NA, se_dur = NA,
    mean_dB = NA, se_dB = NA,
    mean_f0 = NA, se_f0 = NA
  )) %>%
  filter(!is.na(SyllPos) & !is.na(Stress)) %>%
  mutate(
    SyllPos = factor(SyllPos, levels = c("s1", "s2", "s3")),
    Stress = factor(Stress, levels = c("stressed", "unstressed"))
  )

# Duration plot
rusps_dur <- ggplot(summary_rus_posstress_complete, aes(x = SyllPos, y = mean_dur, fill = Stress)) +
  geom_col(position = position_dodge(width = 0.8), width = 0.7) +
  geom_errorbar(aes(ymin = mean_dur - se_dur, ymax = mean_dur + se_dur),
                position = position_dodge(width = 0.8), width = 0.2) +
  labs(x = "Syllable Position", y = "Mean Duration (ms)", fill = "Stress") +
  theme_minimal(base_size = 14)

ggsave("rusps_duration_plot.png", plot = rusps_dur, width = 8, height = 6, dpi = 300, bg = "white")
## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_col()`).
# Intensity plot
rusps_db <- ggplot(summary_rus_posstress_complete, aes(x = SyllPos, y = mean_dB, fill = Stress)) +
  geom_col(position = position_dodge(width = 0.8), width = 0.7) +
  geom_errorbar(aes(ymin = mean_dB - se_dB, ymax = mean_dB + se_dB),
                position = position_dodge(width = 0.8), width = 0.2) +
  labs(x = "Syllable Position", y = "Mean Intensity (dB)", fill = "Stress") +
  theme_minimal(base_size = 14)

ggsave("rusps_intensity_plot.png", plot = rusps_db, width = 8, height = 6, dpi = 300, bg = "white")
## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_col()`).
# F0 plot
rusps_f0 <- ggplot(summary_rus_posstress_complete, aes(x = SyllPos, y = mean_f0, fill = Stress)) +
  geom_col(position = position_dodge(width = 0.8), width = 0.7) +
  geom_errorbar(aes(ymin = mean_f0 - se_f0, ymax = mean_f0 + se_f0),
                position = position_dodge(width = 0.8), width = 0.2) +
  labs(x = "Syllable Position", y = "Mean F0 (Hz)", fill = "Stress") +
  theme_minimal(base_size = 14)

ggsave("rusps_f0_plot.png", plot = rusps_f0, width = 8, height = 6, dpi = 300, bg = "white")
## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_col()`).
# Combine into horizontal panel
rusps_dur_clean <- rusps_dur + theme(axis.title.x = element_blank())
rusps_f0_clean <- rusps_f0 + theme(axis.title.x = element_blank())

rusps_panel <- rusps_dur_clean + rusps_db + rusps_f0_clean +
  plot_layout(ncol = 3, guides = "collect") +
  plot_annotation(tag_levels = 'A')

ggsave("rusps_panel_horizontal.png", plot = rusps_panel, width = 12, height = 5, dpi = 300, bg = "white")
## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_col()`).
## Removed 3 rows containing missing values or values outside the scale range
## (`geom_col()`).
## Removed 3 rows containing missing values or values outside the scale range
## (`geom_col()`).
rusps_panel
## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_col()`).
## Removed 3 rows containing missing values or values outside the scale range
## (`geom_col()`).
## Removed 3 rows containing missing values or values outside the scale range
## (`geom_col()`).

## Russian: Stress Correlates

model_rus_dur <- lmer(Duration_in_ms ~ Stress + (1 | Speaker) + (1 | Word), data = rus_all_syll)
summary(model_rus_dur)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Duration_in_ms ~ Stress + (1 | Speaker) + (1 | Word)
##    Data: rus_all_syll
## 
## REML criterion at convergence: 4149.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.5322 -0.6343 -0.0307  0.5576  6.9722 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Word     (Intercept) 1287     35.88   
##  Speaker  (Intercept) 1168     34.17   
##  Residual             5108     71.47   
## Number of obs: 361, groups:  Word, 41; Speaker, 4
## 
## Fixed effects:
##                  Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)       279.552     18.899   4.092  14.792 0.000105 ***
## Stressunstressed  -34.179      7.690 322.118  -4.445 1.21e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## Strssnstrss -0.229
model_rus_int <- lmer(Mean_dB ~ Stress + (1 | Speaker) + (1 | Word), data = rus_all_syll)
summary(model_rus_int)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Mean_dB ~ Stress + (1 | Speaker) + (1 | Word)
##    Data: rus_all_syll
## 
## REML criterion at convergence: 1975
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.64402 -0.60711  0.02214  0.57733  2.98646 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Word     (Intercept)  2.501   1.582   
##  Speaker  (Intercept)  2.576   1.605   
##  Residual             12.141   3.484   
## Number of obs: 361, groups:  Word, 41; Speaker, 4
## 
## Fixed effects:
##                  Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)       70.4119     0.8863   4.0326  79.449 1.35e-07 ***
## Stressunstressed  -1.9720     0.3747 323.6696  -5.263 2.58e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## Strssnstrss -0.238
model_rus_f0 <- lmer(MeanF0 ~ Stress + (1 | Speaker) + (1 | Word), data = rus_all_syll)
summary(model_rus_f0)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: MeanF0 ~ Stress + (1 | Speaker) + (1 | Word)
##    Data: rus_all_syll
## 
## REML criterion at convergence: 3224
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.8145 -0.6282 -0.0776  0.5820  3.1984 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Word     (Intercept)    9.298  3.049  
##  Speaker  (Intercept) 2879.566 53.662  
##  Residual              430.933 20.759  
## Number of obs: 360, groups:  Word, 41; Speaker, 4
## 
## Fixed effects:
##                  Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)       163.009     26.888   3.015   6.063 0.008874 ** 
## Stressunstressed   -8.390      2.219 332.885  -3.782 0.000185 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## Strssnstrss -0.047

OBSERVATION: Russian Stress

Stress Effects in Russian Words

Acoustic Measure Predictor Estimate t-value p-value Significance
Duration Stress (unstressed) –34.18 –4.45 1.21e–05 ***
Intensity Stress (unstressed) –1.97 –5.26 2.58e–07 ***
F0 Stress (unstressed) –8.39 –3.78 0.00019 ***

The linear mixed effects models indicate that lexical stress in Russian significantly influences all three acoustic correlates—duration, intensity, and fundamental frequency (F0).

  • Duration: Unstressed syllables are, on average, 34.18 ms shorter than stressed ones (p < 0.001), highlighting duration as a robust cue to stress.

  • Intensity (Mean dB): Unstressed syllables are 1.97 dB quieter, also statistically significant (p < 0.001), suggesting that loudness is another reliable correlate.

  • F0: Unstressed syllables have a significantly lower pitch, 8.39 Hz lower than stressed syllables (p < 0.001), consistent with the expectation that pitch rises under stress.

Together, these results show that duration, intensity, and pitch all significantly differentiate stressed from unstressed syllables in Russian. This supports prior findings that Russian exhibits strong acoustic marking of stress across multiple phonetic dimensions compared to Kazakh. However, these results should be taken by a grain of salt since the participants are not native speakers of Russian despite a high bilingual proficiency.

6 Hypothesis Testing

What this code snippet does: (1) Creates a subset of df for CS and CS&Kazakh tokens. (2) Plots hypotheses A,B,C,D. (3) Runs an lmer() model on the created subsets to check Hs.

6.1 Plot A

  • s1:s2 ratio of uninflected CS tokens vs. s1:s2 ratio of inflected CS tokens.
  • s1 or s2 is stressed
# Filter for CS tokens with SyllPos s1 or s2
view(df_full_sample)

# Filter CS words with s1/s2
cs_tokens <- df_full_sample %>%
  filter(Language == "CS", SyllPos %in% c("s1", "s2")) %>%
  group_by(Filename, Word) %>%
  filter(all(c("s1", "s2") %in% SyllPos)) %>%
  ungroup()

# Keep only tokens that appear once per SyllPos (no duplicate s1/s2)
cs_tokens <- cs_tokens %>%
  group_by(Filename, Word, SyllPos) %>%
  filter(n() == 1) %>%
  ungroup()

# Keep these columns untouched
id_cols <- c("Filename", "Speaker", "Gender", "Word", "Word_beg", "Word_end", "Word_dur_ms", 
             "Language", "SuffixCase", "WordForm", "LatinScript", "Gloss", "WordClass", 
             "StressedSyll", "Declension", "NounGender", "StressShift", "ShiftDirect", "AttestedInCS")

# Pivot all columns except for id_cols&SyllPos
pivot_cols <- cs_tokens %>%
  select(-all_of(c(id_cols, "SyllPos"))) %>%
  names()

# Pivot wider
cs_df_wide <- cs_tokens %>%
  pivot_wider(
    id_cols = all_of(id_cols),
    names_from = SyllPos,
    values_from = all_of(pivot_cols),
    names_sep = "_"
  )

# Filter rows where s1 and s2 data are present
cs_df_wide <- cs_df_wide %>%
  filter(
    !is.na(Duration_in_ms_s1) & !is.na(Duration_in_ms_s2),
    !is.na(Mean_dB_s1) & !is.na(Mean_dB_s2),
    !is.na(Max_dB_s1) & !is.na(Max_dB_s2),
    !is.na(MeanF0_s1) & !is.na(MeanF0_s2),
    !is.na(MaxF0Hz_s1) & !is.na(MaxF0Hz_s2)
  )

# Compute ratios
cs_df_wide <- cs_df_wide %>%
  mutate(
    ratio_s1_s2_dur = Duration_in_ms_s1 / Duration_in_ms_s2,
    ratio_mean_int  = Mean_dB_s1 / Mean_dB_s2,
    ratio_max_int   = Max_dB_s1 / Max_dB_s2,
    ratio_mean_f0   = MeanF0_s1 / MeanF0_s2,
    ratio_max_fo    = MaxF0Hz_s1 / MaxF0Hz_s2,
    root            = Word,
    root_stress     = StressedSyll
  )

#view(cs_tokens)
view(cs_df_wide)

# Create the final dataset
cs_roots <- cs_df_wide %>%
  select(Speaker, root, root_stress, WordForm, StressShift, ShiftDirect, ratio_s1_s2_dur, ratio_mean_int, ratio_max_int, ratio_mean_f0, ratio_max_fo)

view(cs_roots)

# Summarize the data
summary_df <- cs_roots %>%
  group_by(root_stress, WordForm) %>%
  summarise(
    mean_ratio_dur  = mean(ratio_s1_s2_dur, na.rm = TRUE),
    sd_ratio_dur    = sd(ratio_s1_s2_dur, na.rm = TRUE),
    
    mean_ratio_int  = mean(ratio_mean_int, na.rm = TRUE),
    sd_ratio_int    = sd(ratio_mean_int, na.rm = TRUE),
    
    mean_ratio_max_int = mean(ratio_max_int, na.rm = TRUE),
    sd_ratio_max_int   = sd(ratio_max_int, na.rm = TRUE),
    
    mean_ratio_f0   = mean(ratio_mean_f0, na.rm = TRUE),
    sd_ratio_f0     = sd(ratio_mean_f0, na.rm = TRUE),
    
    mean_ratio_max_fo = mean(ratio_max_fo, na.rm = TRUE),
    sd_ratio_max_fo   = sd(ratio_max_fo, na.rm = TRUE),
    
    n = n(),
    .groups = "drop"
  ) %>%
  mutate(
    se_ratio_dur  = sd_ratio_dur / sqrt(n),
    se_ratio_int  = sd_ratio_int / sqrt(n),
    se_ratio_max_int = sd_ratio_max_int / sqrt(n),
    se_ratio_f0   = sd_ratio_f0 / sqrt(n),
    se_ratio_max_fo = sd_ratio_max_fo / sqrt(n)
  )
view(summary_df)


# Plot 1: Duration Ratio
cs_ratio_dur <- ggplot(summary_df, aes(x = factor(root_stress), y = mean_ratio_dur, color = WordForm, shape = WordForm)) +
  geom_hline(yintercept = 1, linetype = "dashed", color = "gray50") +
  geom_point(position = position_dodge(width = 0.4), size = 7) +
  geom_errorbar(
    aes(ymin = mean_ratio_dur - se_ratio_dur, ymax = mean_ratio_dur + se_ratio_dur),
    position = position_dodge(width = 0.4),
    width = 0.2
  ) +
  ylim(.7, 1.3) +
  labs(
    x = "Root Stress Position",
    y = "Mean Duration Ratio (s1:s2)",
    color = "Word Form",
    shape = "Word Form"
  ) +
  theme_minimal(base_size = 18) +
  theme(
    axis.title = element_text(size = 18),
    axis.text = element_text(size = 16),
    legend.title = element_text(size = 16),
    legend.text = element_text(size = 14)
  )

#print(cs_ratio_dur)
ggsave("cs_ratio_duration.png", plot = cs_ratio_dur, width = 6, height = 4, dpi = 300, bg = "white")

# Plot 2: Mean Intensity Ratio
cs_ratio_mean_int <- ggplot(summary_df, aes(x = factor(root_stress), y = mean_ratio_int, color = WordForm, shape = WordForm)) +
  geom_hline(yintercept = 1, linetype = "dashed", color = "gray50") +
  geom_point(position = position_dodge(width = 0.4), size = 7) +
  geom_errorbar(
    aes(ymin = mean_ratio_int - se_ratio_int, ymax = mean_ratio_int + se_ratio_int),
    position = position_dodge(width = 0.4),
    width = 0.2
  ) +
  ylim(.7, 1.3) +
  labs(
    x = "Root Stress Position",
    y = "Mean Intensity Ratio (s1:s2)",
    color = "Word Form",
    shape = "Word Form"
  ) +
#scale_color_manual(values = c("uninflected" = "#2ca02c", "inflected" = "#9467bd")) +
#scale_shape_manual(values = c("uninflected" = 15, "inflected" = 18)) +

  theme_minimal(base_size = 18) +
  theme(
    axis.title = element_text(size = 18),
    axis.text = element_text(size = 16),
    legend.title = element_text(size = 16),
    legend.text = element_text(size = 14)
  )


#print(cs_ratio_mean_int)
ggsave("cs_ratio_mean_intensity.png", plot = cs_ratio_mean_int, width = 6, height = 4, dpi = 300, bg = "white")

# Plot 3: Max Intensity Ratio
# cs_ratio_max_int <- ggplot(summary_df, aes(x = factor(root_stress), y = mean_ratio_max_int, color = WordForm, shape = WordForm)) +
#   geom_hline(yintercept = 1, linetype = "dashed", color = "gray50") +
#   geom_point(position = position_dodge(width = 0.4), size = 7) +
#   geom_errorbar(
#     aes(ymin = mean_ratio_max_int - se_ratio_max_int, ymax = mean_ratio_max_int + se_ratio_max_int),
#     position = position_dodge(width = 0.4),
#     width = 0.2
#   ) +
#   ylim(.7, 1.3) +
#   labs(
#     x = "Root Stress Position",
#     y = "Max Intensity Ratio (s1:s2)",
#     color = "Word Form",
#     shape = "Word Form"
#   ) +
#   theme_minimal(base_size = 18) +
#   theme(
#     axis.title = element_text(size = 18),
#     axis.text = element_text(size = 16),
#     legend.title = element_text(size = 16),
#     legend.text = element_text(size = 14)
#   )
# 
# print(cs_ratio_max_int)
# ggsave("cs_ratio_max_intensity.png", plot = cs_ratio_max_int, width = 6, height = 4, dpi = 300)

# Plot 4: Mean F0 Ratio
cs_ratio_mean_f0 <- ggplot(summary_df, aes(x = factor(root_stress), y = mean_ratio_f0, color = WordForm, shape = WordForm)) +
  geom_hline(yintercept = 1, linetype = "dashed", color = "gray50") +
  geom_point(position = position_dodge(width = 0.4), size = 7) +
  geom_errorbar(
    aes(ymin = mean_ratio_f0 - se_ratio_f0, ymax = mean_ratio_f0 + se_ratio_f0),
    position = position_dodge(width = 0.4),
    width = 0.2
  ) +
  ylim(.7, 1.3) +
  labs(
    x = "Root Stress Position",
    y = "Mean F0 Ratio (s1:s2)",
    color = "Word Form",
    shape = "Word Form"
  ) +
  #scale_color_manual(values = c("uninflected" = "#ff7f0e", "inflected" = "#e377c2")) +
#scale_shape_manual(values = c("uninflected" = 8, "inflected" = 4)) +

  theme_minimal(base_size = 18) +
  theme(
    axis.title = element_text(size = 18),
    axis.text = element_text(size = 16),
    legend.title = element_text(size = 16),
    legend.text = element_text(size = 14)
  )

#print(cs_ratio_mean_f0)
ggsave("cs_ratio_mean_f0.png", plot = cs_ratio_mean_f0, width = 6, height = 4, dpi = 300, bg = "white")

# Plot 5: Max F0 Ratio
# cs_ratio_max_f0 <- ggplot(summary_df, aes(x = factor(root_stress), y = mean_ratio_max_fo, color = WordForm, shape = WordForm)) +
#   geom_hline(yintercept = 1, linetype = "dashed", color = "gray50") +
#   geom_point(position = position_dodge(width = 0.4), size = 7) +
#   geom_errorbar(
#     aes(ymin = mean_ratio_max_fo - se_ratio_max_fo, ymax = mean_ratio_max_fo + se_ratio_max_fo),
#     position = position_dodge(width = 0.4),
#     width = 0.2
#   ) +
#   ylim(.7, 1.3) +
#   labs(
#     x = "Root Stress Position",
#     y = "Max F0 Ratio (s1:s2)",
#     color = "Word Form",
#     shape = "Word Form"
#   ) +
#   theme_minimal(base_size = 18) +
#   theme(
#     axis.title = element_text(size = 18),
#     axis.text = element_text(size = 16),
#     legend.title = element_text(size = 16),
#     legend.text = element_text(size = 14)
#   )
# 
# print(cs_ratio_max_f0)
# ggsave("cs_ratio_max_f0.png", plot = cs_ratio_max_f0, width = 6, height = 4, dpi = 300)

# Combine into horizontal panel
cs_ratio_dur_clean <- cs_ratio_dur + theme(axis.title.x = element_blank())
cs_ratio_mean_f0_clean <- cs_ratio_mean_f0 + theme(axis.title.x = element_blank())

cs_panel_plotA <- cs_ratio_dur_clean + cs_ratio_mean_int + cs_ratio_mean_f0_clean +
  plot_layout(ncol = 3, guides = "collect") +
  plot_annotation(tag_levels = 'A')

ggsave("plotA_panel_horizontal.png", plot = cs_panel_plotA, width = 12, height = 5, dpi = 300, bg = "white")

cs_panel_plotA

6.2 Model A

### Run lmer() on the subset of dataset

# Test H1: Stress remains fixed on the root
head(cs_roots) 
# The reference level 'uninflected' word forms and against which 'inflected' will be compared in the #. # output. 

# Convert wordform to a factor since there are two cat levels
cs_roots$WordForm <- factor(cs_roots$WordForm)  
# Set ref level
cs_roots$WordForm <- relevel(cs_roots$WordForm, ref = "uninflected")

model_a_dur <- lmer(ratio_s1_s2_dur ~ WordForm*factor(root_stress) + (1|root) +(1|Speaker), data=cs_roots)
summary(model_a_dur)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: ratio_s1_s2_dur ~ WordForm * factor(root_stress) + (1 | root) +  
##     (1 | Speaker)
##    Data: cs_roots
## 
## REML criterion at convergence: 291.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.7190 -0.5276 -0.1247  0.4397  3.3426 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  root     (Intercept) 0.172647 0.41551 
##  Speaker  (Intercept) 0.004828 0.06949 
##  Residual             0.083040 0.28817 
## Number of obs: 302, groups:  root, 80; Speaker, 4
## 
## Fixed effects:
##                                        Estimate Std. Error       df t value
## (Intercept)                             1.09456    0.10508 60.18736  10.417
## WordForminflected                       0.09193    0.14009 74.39859   0.656
## factor(root_stress)2                   -0.28313    0.14009 74.56604  -2.021
## WordForminflected:factor(root_stress)2  0.09705    0.19775 73.96366   0.491
##                                        Pr(>|t|)    
## (Intercept)                            4.29e-15 ***
## WordForminflected                        0.5137    
## factor(root_stress)2                     0.0469 *  
## WordForminflected:factor(root_stress)2   0.6250    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) WrdFrm fc(_)2
## WrdFrmnflct -0.668              
## fctr(rt_s)2 -0.668  0.501       
## WrdFrm:(_)2  0.473 -0.708 -0.708
model_a_int <- lmer(ratio_mean_int ~ WordForm*factor(root_stress) + (1|root) +(1|Speaker), data=cs_roots)
summary(model_a_int)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: ratio_mean_int ~ WordForm * factor(root_stress) + (1 | root) +  
##     (1 | Speaker)
##    Data: cs_roots
## 
## REML criterion at convergence: -693
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.32995 -0.56131 -0.02718  0.60862  2.70449 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  root     (Intercept) 0.004257 0.06525 
##  Speaker  (Intercept) 0.000353 0.01879 
##  Residual             0.003394 0.05826 
## Number of obs: 302, groups:  root, 80; Speaker, 4
## 
## Fixed effects:
##                                        Estimate Std. Error       df t value
## (Intercept)                             1.07401    0.01871 29.29282  57.403
## WordForminflected                      -0.05681    0.02284 76.24030  -2.488
## factor(root_stress)2                   -0.06925    0.02284 76.67589  -3.032
## WordForminflected:factor(root_stress)2  0.04370    0.03221 75.69390   1.357
##                                        Pr(>|t|)    
## (Intercept)                             < 2e-16 ***
## WordForminflected                       0.01504 *  
## factor(root_stress)2                    0.00332 ** 
## WordForminflected:factor(root_stress)2  0.17890    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) WrdFrm fc(_)2
## WrdFrmnflct -0.613              
## fctr(rt_s)2 -0.613  0.502       
## WrdFrm:(_)2  0.434 -0.709 -0.709
model_a_f0 <- lmer(ratio_mean_f0 ~ WordForm*factor(root_stress) + (1|root) +(1|Speaker), data=cs_roots)
## boundary (singular) fit: see help('isSingular')
summary(model_a_f0)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: ratio_mean_f0 ~ WordForm * factor(root_stress) + (1 | root) +  
##     (1 | Speaker)
##    Data: cs_roots
## 
## REML criterion at convergence: -467.7
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.7942 -0.5665  0.0018  0.5803  3.1138 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  root     (Intercept) 0.000000 0.00000 
##  Speaker  (Intercept) 0.001401 0.03743 
##  Residual             0.011232 0.10598 
## Number of obs: 302, groups:  root, 80; Speaker, 4
## 
## Fixed effects:
##                                         Estimate Std. Error        df t value
## (Intercept)                              1.08789    0.02250   5.10420  48.345
## WordForminflected                       -0.03075    0.01738 295.00693  -1.770
## factor(root_stress)2                    -0.04211    0.01756 295.07875  -2.399
## WordForminflected:factor(root_stress)2  -0.02429    0.02442 295.05858  -0.995
##                                        Pr(>|t|)    
## (Intercept)                            5.42e-08 ***
## WordForminflected                        0.0778 .  
## factor(root_stress)2                     0.0171 *  
## WordForminflected:factor(root_stress)2   0.3207    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) WrdFrm fc(_)2
## WrdFrmnflct -0.399              
## fctr(rt_s)2 -0.395  0.512       
## WrdFrm:(_)2  0.284 -0.712 -0.719
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

OBSERVATION:Model_A

Model A: Duration Ratio

Formula: ratio_s1_s2_dur ~ WordForm * factor(root_stress) + (1|root) + (1|Speaker)

Term Estimate p-value Interpretation
(Intercept) 1.095 < .001 Baseline duration ratio for uninflected words with stress on s1
WordForminflected 0.092 .514 Inflected words show slightly higher s1:s2 duration ratio (NS)
factor(root_stress)2 –0.283 .047 Stress on s2 results in lower s1:s2 duration ratio (s1 becomes shorter)
WordForminflected:factor(root_stress)2 0.097 .625 No significant interaction effect

Model A: Mean Intensity Ratio

Formula: ratio_mean_int ~ WordForm * factor(root_stress) + (1|root) + (1|Speaker)

Term Estimate p-value Interpretation
(Intercept) 1.074 < .001 Baseline intensity ratio for uninflected words with stress on s1
WordForminflected –0.057 .015 Inflected words show significantly lower s1:s2 intensity
factor(root_stress)2 –0.069 .003 Stress on s2 lowers the s1:s2 intensity ratio
WordForminflected:factor(root_stress)2 0.044 .179 No significant interaction

Model A: Mean F0 Ratio

Formula: ratio_mean_f0 ~ WordForm * factor(root_stress) + (1|root) + (1|Speaker)

Term Estimate p-value Interpretation
(Intercept) 1.088 < .001 Baseline F0 ratio for uninflected words with stress on s1
WordForminflected –0.031 .078 Inflected words show marginally lower F0 ratio (not quite significant)
factor(root_stress)2 –0.042 .017 Stress on s2 lowers the s1:s2 F0 ratio
WordForminflected:factor(root_stress)2 –0.024 .321 No significant interaction effect

Summary

Feature Mean Duration Mean Intensity Mean F0
Stress position effect Significant (decrease in s1:s2 ratio) Significant (decrease in s1:s2 ratio) Significant (decrease in s1:s2 ratio)
Inflection effect Not Significant Significant (decrease in s1:s2 ratio) Not Significant
**Interaction (Stress*WordForm)** Not Significant Not Significant Not Significant

6.3 Plot B

  • CS tokens by Stress position and WordForm
## Plot B
cs_all_syll <- df_full_sample %>%
  filter(Language == "CS") %>%
  filter(WordForm == "inflected") %>%
 filter(!is.na(SyllPos))

view(cs_all_syll)

# Plot with error bars (b) == duration of s1, s2, s3 by Stress and WordForm

# Summarize duration, intensity, and F0
summary_cs_all <- cs_all_syll %>%
  group_by(StressedSyll, SyllPos) %>%
  summarise(
    mean_dur = mean(Duration_in_ms, na.rm = TRUE),
    sd_dur = sd(Duration_in_ms, na.rm = TRUE),

    mean_dB = mean(Mean_dB, na.rm = TRUE),
    sd_dB = sd(Mean_dB, na.rm = TRUE),

    mean_f0 = mean(MeanF0, na.rm = TRUE),
    sd_f0 = sd(MeanF0, na.rm = TRUE),

    n = n(),
    .groups = "drop"
  ) %>%
  mutate(
    se_dur = sd_dur / sqrt(n),
    se_dB = sd_dB / sqrt(n),
    se_f0 = sd_f0 / sqrt(n)
  )



# Neutral grey palette
#grey_palette <- c("s1" = "#999999", "s2" = "#666666", "s3" = "#333333")

# Shared minimalist theme (legend removed for first 2 plots)
shared_theme <- theme_minimal(base_size = 18) +
  theme(
    axis.title = element_text(size = 18),
    axis.text = element_text(size = 16),
    legend.title = element_blank(),
    legend.text = element_text(size = 10),
    legend.position = "none"
  )

# Plot 1: Duration
cs_s1s2s3_plot <- ggplot(summary_cs_all, aes(x = factor(StressedSyll), y = mean_dur, fill = SyllPos)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
  geom_errorbar(
    aes(ymin = mean_dur - se_dur, ymax = mean_dur + se_dur),
    position = position_dodge(width = 0.9),
    width = 0.2
  ) +
  # scale_fill_manual(values = grey_palette) +
  labs(
    x = "Root Stress",
    y = "Duration (ms)"
  ) +
  shared_theme

# Plot 2: Intensity
cs_s1s2s3_intensity_plot <- ggplot(summary_cs_all, aes(x = factor(StressedSyll), y = mean_dB, fill = SyllPos)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
  geom_errorbar(
    aes(ymin = mean_dB - se_dB, ymax = mean_dB + se_dB),
    position = position_dodge(width = 0.9),
    width = 0.2
  ) +
  # scale_fill_manual(values = grey_palette) +
  labs(
    x = "Root Stress",
    y = "Intensity (dB)"
  ) +
  shared_theme


# Plot 3: F0 (with legend)
cs_s1s2s3_f0_plot <- ggplot(summary_cs_all, aes(x = factor(StressedSyll), y = mean_f0, fill = SyllPos)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
  geom_errorbar(
    aes(ymin = mean_f0 - se_f0, ymax = mean_f0 + se_f0),
    position = position_dodge(width = 0.9),
    width = 0.2
  ) +
  # scale_fill_manual(values = grey_palette) +
  labs(
    x = "Root Stress",
    y = "F0 (Hz)",
    fill = "Syllable Position"
  ) +
  shared_theme

# Combine horizontally with shared legend
cs_s1s2s3_plot_clean <- cs_s1s2s3_plot + theme(axis.title.x = element_blank())
cs_s1s2s3_f0_plot_clean <- cs_s1s2s3_f0_plot + theme(axis.title.x = element_blank())

cs_panel_horizontal <- (cs_s1s2s3_plot_clean | cs_s1s2s3_intensity_plot | cs_s1s2s3_f0_plot_clean) +
  plot_layout(ncol = 3, guides = "collect") &
  theme(legend.position = "bottom")

print(cs_panel_horizontal)

# Save output
ggsave("cs_s1s2s3_panel_horizontal.png", plot = cs_panel_horizontal, width = 15, height = 5, dpi = 400, bg = "white")

# Print for inspection


# Create vertical layout
# Duration plot (no x-axis label, no legend)
# cs_s1s2s3_plot_clean <- cs_s1s2s3_plot +
#   labs(x = NULL) +
#   theme(legend.position = "none")

# Intensity plot (no x-axis label, no legend)
# cs_s1s2s3_intensity_plot_clean <- cs_s1s2s3_intensity_plot +
#   labs(x = NULL) +
#   theme(legend.position = "none")

# F0 plot (with x-axis label and shared legend)
# cs_s1s2s3_f0_plot_clean <- cs_s1s2s3_f0_plot +
#   labs(x = "Root Stress Position") +
#   theme(legend.position = "bottom")

# Combine plots vertically
# cs_panel_vertical_clean <- cs_s1s2s3_plot_clean /
#                            cs_s1s2s3_intensity_plot_clean /
#                            cs_s1s2s3_f0_plot_clean +
#   plot_layout(guides = "collect") &
#   theme(legend.position = "bottom")

# Save cleaned panel
# ggsave("cs_s1s2s3_panel_vertical_clean.png", plot = cs_panel_vertical_clean,
#        width = 7, height = 12, dpi = 400, bg = "white")

# Print for review
# print(cs_panel_vertical_clean)

6.4 Model B

# Test H2:Stress follows Kazakh rules
# s3 would have significantly longer duration than s1 and s2 if H3 is true. 
# dataset contains durations of all s1,s2, s3
head(cs_all_syll)
# Initial model b for comparing s1,s2 and s3 
# model_b <- lmer(Duration_in_ms ~ SyllPos*Stress + (1|Speaker), data=cs_all_syll)
# summary(model_b)

# reference level - s1 stressed vs s3
# reference level - s2 stressed vs s3
# comparing positional difference based on root stress 
# new code below taking into account above comments:

## s1 vs s3
# Recode NA as "no_stress" 
cs_all_syll$Stress <- as.character(cs_all_syll$Stress)
cs_all_syll$Stress[is.na(cs_all_syll$Stress)] <- "no_stress"
cs_all_syll$Stress <- factor(cs_all_syll$Stress, levels = c("stressed", "unstressed", "no_stress"))

# Filter only s1 and s3 rows
cs_s1_s3_new <- cs_all_syll %>%
  filter(SyllPos %in% c("s1", "s3"))

# Identify words where s1 is stressed
words_with_stressed_s1 <- cs_s1_s3_new %>%
  filter(SyllPos == "s1", Stress == "stressed") %>%
  pull(Word) %>% unique()

# Keep s1 and s3 syllables only from those words
cs_s1_s3_stressed_new <- cs_s1_s3_new %>%
  filter(Word %in% words_with_stressed_s1)
cs_s1_s3_stressed_new
# Fit the model (model B_1, stressed s1 vs. s3 no_stress)
model_s1_vs_s3_dur <- lmer(Duration_in_ms ~ factor(SyllPos) + (1 | Speaker), data = cs_s1_s3_stressed_new)
## boundary (singular) fit: see help('isSingular')
summary(model_s1_vs_s3_dur)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Duration_in_ms ~ factor(SyllPos) + (1 | Speaker)
##    Data: cs_s1_s3_stressed_new
## 
## REML criterion at convergence: 1845.7
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2060 -0.7044 -0.0497  0.4628  3.5991 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Speaker  (Intercept)    0      0.00   
##  Residual             4589     67.74   
## Number of obs: 165, groups:  Speaker, 4
## 
## Fixed effects:
##                   Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)       222.8084     7.3910 163.0000   30.15   <2e-16 ***
## factor(SyllPos)s3   0.5325    10.5487 163.0000    0.05     0.96    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## fctr(SylP)3 -0.701
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
model_s1_vs_s3_int <- lmer(Mean_dB ~ factor(SyllPos) + (1 | Speaker), data = cs_s1_s3_stressed_new)
summary(model_s1_vs_s3_int)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Mean_dB ~ factor(SyllPos) + (1 | Speaker)
##    Data: cs_s1_s3_stressed_new
## 
## REML criterion at convergence: 969
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.6219 -0.4066  0.0381  0.6665  2.0886 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Speaker  (Intercept)  3.058   1.749   
##  Residual             20.414   4.518   
## Number of obs: 165, groups:  Speaker, 4
## 
## Fixed effects:
##                   Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)        70.1355     1.0037   3.8656  69.877 3.85e-07 ***
## factor(SyllPos)s3  -2.7805     0.7036 160.0103  -3.952 0.000116 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## fctr(SylP)3 -0.344
model_s1_vs_s3_f0 <- lmer(MeanF0 ~ factor(SyllPos) + (1 | Speaker), data = cs_s1_s3_stressed_new)
summary(model_s1_vs_s3_f0)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: MeanF0 ~ factor(SyllPos) + (1 | Speaker)
##    Data: cs_s1_s3_stressed_new
## 
## REML criterion at convergence: 1445.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.3310 -0.5438 -0.0116  0.7014  2.5875 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Speaker  (Intercept) 3590.1   59.92   
##  Residual              353.1   18.79   
## Number of obs: 165, groups:  Speaker, 4
## 
## Fixed effects:
##                   Estimate Std. Error      df t value Pr(>|t|)  
## (Intercept)        160.376     30.029   3.014   5.341   0.0127 *
## factor(SyllPos)s3   -2.833      2.927 160.000  -0.968   0.3344  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## fctr(SylP)3 -0.048
# Filter to keep only rows where Stress is "stressed" AND SyllPos is s2 and s3

# Filter only s1 and s3 rows
cs_s2_s3_new <- cs_all_syll %>%
  filter(SyllPos %in% c("s2", "s3"))

# Identify words where s2 is stressed
words_with_stressed_s2 <- cs_s2_s3_new %>%
  filter(SyllPos == "s2", Stress == "stressed") %>%
  pull(Word) %>% unique()

# Keep s2 and s3 syllables only from those words
cs_s2_s3_stressed_new <- cs_s2_s3_new %>%
  filter(Word %in% words_with_stressed_s2)
cs_s2_s3_stressed_new
# Fit the model (model B_2, stressed s2 vs. s3 no_stress)
model_s2_vs_s3_dur <- lmer(Duration_in_ms ~ factor(SyllPos) + (1 | Speaker), data = cs_s2_s3_stressed_new)
summary(model_s2_vs_s3_dur)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Duration_in_ms ~ factor(SyllPos) + (1 | Speaker)
##    Data: cs_s2_s3_stressed_new
## 
## REML criterion at convergence: 1856.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.0340 -0.7360 -0.0397  0.6178  3.3429 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Speaker  (Intercept)  466.2   21.59   
##  Residual             3860.2   62.13   
## Number of obs: 168, groups:  Speaker, 4
## 
## Fixed effects:
##                   Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)        238.700     12.748   4.067  18.725 4.23e-05 ***
## factor(SyllPos)s3    2.131      9.587 163.000   0.222    0.824    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## fctr(SylP)3 -0.376
model_s2_vs_s3_int <- lmer(Mean_dB ~ factor(SyllPos) + (1 | Speaker), data = cs_s2_s3_stressed_new)
summary(model_s2_vs_s3_int)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Mean_dB ~ factor(SyllPos) + (1 | Speaker)
##    Data: cs_s2_s3_stressed_new
## 
## REML criterion at convergence: 902.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.0611 -0.5299  0.0471  0.6822  2.2831 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Speaker  (Intercept)  3.536   1.881   
##  Residual             12.174   3.489   
## Number of obs: 168, groups:  Speaker, 4
## 
## Fixed effects:
##                   Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)        70.0537     1.0144   3.4714  69.059 1.43e-06 ***
## factor(SyllPos)s3  -0.8140     0.5384 163.0000  -1.512    0.133    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## fctr(SylP)3 -0.265
model_s2_vs_s3_f0 <- lmer(MeanF0 ~ factor(SyllPos) + (1 | Speaker), data = cs_s2_s3_stressed_new)
summary(model_s2_vs_s3_f0)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: MeanF0 ~ factor(SyllPos) + (1 | Speaker)
##    Data: cs_s2_s3_stressed_new
## 
## REML criterion at convergence: 1328.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -6.5301 -0.4348 -0.0686  0.6563  2.5248 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Speaker  (Intercept) 4183.4   64.68   
##  Residual              146.3   12.09   
## Number of obs: 168, groups:  Speaker, 4
## 
## Fixed effects:
##                   Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)        154.460     32.367   3.005   4.772   0.0174 *  
## factor(SyllPos)s3    9.618      1.866 163.000   5.154 7.31e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## fctr(SylP)3 -0.029

OBSERVATION: Model_B

Model B: Duration (s1_stressed vs s3)

Formula: Duration_in_ms ~ factor(SyllPos) + (1 | Speaker)

Term Estimate p-value Interpretation
(Intercept) 222.81 < .001 Baseline duration for s1 syllables is approximately 223 ms
factor(SyllPos)s3 0.53 .960 No significant duration difference; s3 duration is nearly identical to s1

Model B: Intensity (s1_stressed vs s3)

Formula: Mean_dB ~ factor(SyllPos) + (1 | Speaker)

Term Estimate p-value Interpretation
(Intercept) 70.14 < .001 Baseline intensity for s1 syllables is approximately 70.1 dB
factor(SyllPos)s3 –2.78 < .001 s3 syllables are significantly less intense than s1 by ~2.8 dB

Model B: F0 (s1_stressed vs s3)

Formula: MeanF0 ~ factor(SyllPos) + (1 | Speaker)

Term Estimate p-value Interpretation
(Intercept) 160.38 .013 Baseline F0 for s1 syllables is approximately 160 Hz
factor(SyllPos)s3 –2.83 .334 s3 syllables show no significant F0 difference compared to s1

Summary

Feature Duration Intensity F0
s3 effect Not significant (p = .960) Significant (p < .001) Not significant (p = .334)
Estimate +0.53 ms –2.78 dB –2.83 Hz
Interpretation No change from s1 s3 has lower intensity No meaningful F0 difference

Model B: Duration (s2_stressed vs s3)

Formula: Duration_in_ms ~ factor(SyllPos) + (1 | Speaker)

Term Estimate p-value Interpretation
(Intercept) 238.70 < .001 Baseline duration for s2 syllables is approximately 239 ms
factor(SyllPos)s3 2.13 .824 s3 duration is not significantly different from s2

Model B: Intensity (s2_stressed vs s3)

Formula: Mean_dB ~ factor(SyllPos) + (1 | Speaker)

Term Estimate p-value Interpretation
(Intercept) 70.05 < .001 Baseline intensity for s2 syllables is approximately 70.1 dB
factor(SyllPos)s3 –0.81 .133 s3 is ~0.8 dB less intense, but this difference is not significant

Model B: F0 (s2_stressed vs s3)

Formula: MeanF0 ~ factor(SyllPos) + (1 | Speaker)

Term Estimate p-value Interpretation
(Intercept) 154.46 .017 Baseline F0 for s2 syllables is approximately 154 Hz
factor(SyllPos)s3 +9.62 < .001 s3 syllables show a significantly higher F0 (~9.6 Hz) compared to s2

Summary

Feature Duration Intensity F0
s3 effect Not significant (p = .824) Not significant (p = .133) Significant increase (p < .001)
Estimate +2.13 ms –0.81 dB +9.62 Hz
Interpretation No meaningful change No meaningful change s3 has notably higher pitch

6.5 Plot C

  • s3 difference in Kaz and CS tokens
## Plot C == s3 difference in Kaz and CS tokens 

kz_cs_df <- df_full_sample %>%
  filter(Language %in% c("CS", "Kaz"),
         SyllPos == 's3') %>%
 filter(!is.na(SyllPos))
#view(kz_cs_df)

summary_kz_cs <- kz_cs_df %>%
  group_by(Language) %>%
  summarise(
    mean_dur = mean(Duration_in_ms, na.rm = TRUE),
    sd_dur   = sd(Duration_in_ms, na.rm = TRUE),

    mean_dB  = mean(Mean_dB, na.rm = TRUE),
    sd_dB    = sd(Mean_dB, na.rm = TRUE),

    mean_f0  = mean(MeanF0, na.rm = TRUE),
    sd_f0    = sd(MeanF0, na.rm = TRUE),

    n = n(),
    .groups = "drop"
  ) %>%
  mutate(
    se_dur = sd_dur / sqrt(n),
    se_dB  = sd_dB / sqrt(n),
    se_f0  = sd_f0 / sqrt(n)
  )

# Shared color palette and theme
# fill_colors <- c("Kaz" = "#666666", "CS" = "#a6cee3")

base_theme <- theme_minimal(base_size = 16) +
  theme(
    axis.title = element_text(size = 16),
    axis.text = element_text(size = 14),
    legend.position = "none"
  )

# Duration plot
p_dur <- ggplot(summary_kz_cs, aes(x = Language, y = mean_dur, fill = Language)) +
  geom_bar(stat = "identity", position = position_dodge(0.4), width = 0.4) +
  geom_errorbar(aes(ymin = mean_dur - se_dur, ymax = mean_dur + se_dur),
                width = 0.2, position = position_dodge(0.4)) +
  # scale_fill_manual(values = fill_colors) +
  labs(y = "Mean Duration (ms)", x = NULL) +
  base_theme

# Intensity plot
p_dB <- ggplot(summary_kz_cs, aes(x = Language, y = mean_dB, fill = Language)) +
  geom_bar(stat = "identity", position = position_dodge(0.4), width = 0.4) +
  geom_errorbar(aes(ymin = mean_dB - se_dB, ymax = mean_dB + se_dB),
                width = 0.2, position = position_dodge(0.4)) +
  # scale_fill_manual(values = fill_colors) +
  labs(y = "Mean Intensity (dB)", x = NULL) +
  base_theme

# F0 plot
p_f0 <- ggplot(summary_kz_cs, aes(x = Language, y = mean_f0, fill = Language)) +
  geom_bar(stat = "identity", position = position_dodge(0.4), width = 0.4) +
  geom_errorbar(aes(ymin = mean_f0 - se_f0, ymax = mean_f0 + se_f0),
                width = 0.2, position = position_dodge(0.4)) +
  # scale_fill_manual(values = fill_colors) +
  labs(y = "Mean F0 (Hz)", x = NULL) +
  base_theme

# Horizontal panel with A, B, C annotations
panel_horizontal <- p_dur + p_dB + p_f0 +
  plot_layout(ncol = 3, guides = "collect") & 
  theme(legend.position = "bottom") 
  # plot_annotation(tag_levels = 'A') 
  
panel_horizontal <- panel_horizontal + plot_annotation(
  title = NULL,
  subtitle = NULL,
  caption = "Language"
)

print(panel_horizontal)

ggsave("kz_cs_panel_horizontal_tagged.png", panel_horizontal, width = 15, height = 5, dpi = 400, bg = "white")

# Vertical panel with A, B, C annotations
# panel_vertical <- p_dur / p_dB / p_f0 +
#   plot_layout(ncol = 1, guides = "collect") +
#   plot_annotation(tag_levels = 'A')
# 
# ggsave("kz_cs_panel_vertical_tagged.png", panel_vertical, width = 6, height = 12, dpi = 400, bg = "white")

# Print both for preview

#print(panel_vertical )

6.6 Model C

# Test H3: A mix of Kazakh and Russian stress 
# dataset contains duration of s3 only for Kaz and CS
# Duration of s3 by Language
head(kz_cs_df)
model_c_dur <- lmer(Duration_in_ms ~ factor(Language) + (1|Speaker), data=kz_cs_df)
summary(model_c_dur)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Duration_in_ms ~ factor(Language) + (1 | Speaker)
##    Data: kz_cs_df
## 
## REML criterion at convergence: 3365
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.6393 -0.6892 -0.0484  0.6634  3.4469 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Speaker  (Intercept)   83.5    9.138  
##  Residual             2801.2   52.926  
## Number of obs: 313, groups:  Speaker, 4
## 
## Fixed effects:
##                    Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)         243.438      6.231   5.077  39.066 1.71e-07 ***
## factor(Language)CS  -10.117      5.983 308.003  -1.691   0.0919 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## fctr(Lng)CS -0.482
model_c_int <- lmer(Mean_dB ~ factor(Language) + (1|Speaker), data=kz_cs_df)
summary(model_c_int)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Mean_dB ~ factor(Language) + (1 | Speaker)
##    Data: kz_cs_df
## 
## REML criterion at convergence: 1748.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.5323 -0.4373  0.1642  0.6724  1.9683 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Speaker  (Intercept)  3.858   1.964   
##  Residual             15.224   3.902   
## Number of obs: 313, groups:  Speaker, 4
## 
## Fixed effects:
##                    Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)         68.6356     1.0306   3.2988  66.600 2.82e-06 ***
## factor(Language)CS  -0.4229     0.4411 308.0027  -0.959    0.338    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## fctr(Lng)CS -0.215
model_c_f0 <- lmer(MeanF0 ~ factor(Language) + (1|Speaker), data=kz_cs_df)
summary(model_c_f0)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: MeanF0 ~ factor(Language) + (1 | Speaker)
##    Data: kz_cs_df
## 
## REML criterion at convergence: 2611.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.0077 -0.4203  0.0517  0.6083  2.3689 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Speaker  (Intercept) 3820.0   61.81   
##  Residual              234.4   15.31   
## Number of obs: 313, groups:  Speaker, 4
## 
## Fixed effects:
##                    Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)         168.295     30.928   3.005   5.442   0.0121 *  
## factor(Language)CS   -7.169      1.731 308.000  -4.142 4.44e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## fctr(Lng)CS -0.028

OBSERVATION:Model_C

Model C: Duration

Formula: Duration_in_ms ~ factor(Language) + (1 | Speaker)

Term Estimate p-value Interpretation
(Intercept) 243.44 < 0.001 Baseline duration for Kazakh tokens: ~243 ms
factor(Language)CS –10.12 0.092 CS tokens are ~10 ms shorter, marginally significant

Model C: Mean Intensity

Formula: Mean_dB ~ factor(Language) + (1 | Speaker)

Term Estimate p-value Interpretation
(Intercept) 68.64 < 0.001 Baseline mean intensity for Kazakh tokens: ~68.6 dB
factor(Language)CS –0.42 0.338 CS tokens show slightly lower intensity, not significant

Model C: Mean F0

Formula: MeanF0 ~ factor(Language) + (1 | Speaker)

Term Estimate p-value Interpretation
(Intercept) 168.30 0.012 Baseline mean F0 for Kazakh tokens: ~168 Hz
factor(Language)CS –7.17 < 0.001 CS tokens show significantly lower F0 (~7 Hz drop)

Summary

Feature Language Effect Interpretation
Duration Marginal CS tokens trend shorter than Kazakh (~10 ms diff)
Mean Intensity Not significant No notable difference across languages
Mean F0 Significant CS tokens show a clear F0 drop (~7 Hz lower)

6.7 Plot D

  • Mobile (forward moving stress) vs immobile CS roots
  • Difference in Duration, Intensity, and F0.

6.8 Plot D

  • fixed and mobile root difference
  • duration, intensity, and f0
cs_all_syll_shift <- df_full_sample %>%
   filter(Language == "CS") %>%
   filter(ShiftDirect %in% c("forward", "na")) %>%
    mutate(ShiftDirect = recode(ShiftDirect,
                              "forward" = "mobile",
                             "na" = "fixed")) %>%
  filter(!is.na(SyllPos))

#  filter(StressShift == "no") %>%
#   sample_n(size = 59)
# view(cs_all_syll)

# Plot with error bars (b) == duration of s1, s2, s3 by Stress and WordForm

# Summarize duration, intensity, and F0
summary_cs_all_shift <- cs_all_syll_shift %>%
  group_by(StressedSyll, SyllPos,ShiftDirect) %>%
  summarise(
    mean_dur = mean(Duration_in_ms, na.rm = TRUE),
    sd_dur = sd(Duration_in_ms, na.rm = TRUE),

    mean_dB = mean(Mean_dB, na.rm = TRUE),
    sd_dB = sd(Mean_dB, na.rm = TRUE),

    mean_f0 = mean(MeanF0, na.rm = TRUE),
    sd_f0 = sd(MeanF0, na.rm = TRUE),

    n = n(),
    .groups = "drop"
  ) %>%
  mutate(
    se_dur = sd_dur / sqrt(n),
    se_dB = sd_dB / sqrt(n),
    se_f0 = sd_f0 / sqrt(n)
  )


# Neutral grey palette
#grey_palette <- c("s1" = "#999999", "s2" = "#666666", "s3" = "#333333")

# Shared minimalist theme (legend removed for first 2 plots)
shared_theme <- theme_minimal(base_size = 18) +
  theme(
    axis.title = element_text(size = 18),
    axis.text = element_text(size = 16),
    legend.title = element_blank(),
    legend.text = element_text(size = 10),
    legend.position = "none"
  )

# Plot 1: Duration
cs_s1s2s3_plot <- ggplot(summary_cs_all_shift, aes(x = factor(StressedSyll), y = mean_dur, fill = SyllPos)) +
  facet_wrap(~ShiftDirect) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
  geom_errorbar(
    aes(ymin = mean_dur - se_dur, ymax = mean_dur + se_dur),
    position = position_dodge(width = 0.9),
    width = 0.2
  ) +
  # scale_fill_manual(values = grey_palette) +
  labs(
    x = "Root Stress",
    y = "Duration (ms)"
  ) +
  shared_theme

print(cs_s1s2s3_plot)

# Plot 2: Intensity
cs_s1s2s3_intensity_plot <- ggplot(summary_cs_all_shift, aes(x = factor(StressedSyll), y = mean_dB, fill = SyllPos)) +
  facet_wrap(~ShiftDirect) + 
  geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
  geom_errorbar(
    aes(ymin = mean_dB - se_dB, ymax = mean_dB + se_dB),
    position = position_dodge(width = 0.9),
    width = 0.2
  ) +
  # scale_fill_manual(values = grey_palette) +
  labs(
    x = "Root Stress",
    y = "Intensity (dB)"
  ) +
  shared_theme

print(cs_s1s2s3_intensity_plot)

# Plot 3: F0 (with legend)
cs_s1s2s3_f0_plot <- ggplot(summary_cs_all_shift, aes(x = factor(StressedSyll), y = mean_f0, fill = SyllPos)) +
  facet_wrap(~ShiftDirect) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
  geom_errorbar(
    aes(ymin = mean_f0 - se_f0, ymax = mean_f0 + se_f0),
    position = position_dodge(width = 0.9),
    width = 0.2
  ) +
  # scale_fill_manual(values = grey_palette) +
  labs(
    x = "Root Stress",
    y = "F0 (Hz)",
    fill = "Syllable Position"
  ) +
  shared_theme
print(cs_s1s2s3_f0_plot)

6.9 Model D

  • Fixed roots - stress does not move to the s3.
  • Mobile (forward) shifts - stress does move to the s3.
view(cs_roots_balanced) 
# The reference level 'fixed' stress and against which 'mobile' will be compared to. 

# Convert ShiftDirect to a factor since there are two cat levels
cs_roots_balanced$WordForm <- factor(cs_roots_balanced$WordForm)  
# Set ref level
cs_roots_balanced$WordForm <- relevel(cs_roots_balanced$WordForm, ref = "uninflected")

cs_roots_balanced_fixed <- cs_roots_balanced %>%
  filter(ShiftDirect == "fixed")

cs_roots_balanced_mobile <- cs_roots_balanced %>%
  filter(ShiftDirect == "mobile")


# Model predictions for fixed roots
# Out predictions is the stress remaints on the root

model_d_dur <- lmer(ratio_s1_s2_dur ~ WordForm*factor(root_stress) + (1|root) +(1|Speaker), data=cs_roots_balanced_fixed)
summary(model_d_dur)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: ratio_s1_s2_dur ~ WordForm * factor(root_stress) + (1 | root) +  
##     (1 | Speaker)
##    Data: cs_roots_balanced_fixed
## 
## REML criterion at convergence: 57.8
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.85222 -0.48912 -0.02762  0.38907  2.02330 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  root     (Intercept) 0.165655 0.40701 
##  Speaker  (Intercept) 0.007166 0.08465 
##  Residual             0.046051 0.21459 
## Number of obs: 59, groups:  root, 35; Speaker, 4
## 
## Fixed effects:
##                                        Estimate Std. Error       df t value
## (Intercept)                             1.16942    0.14071 29.60293   8.311
## WordForminflected                      -0.01651    0.19095 30.38455  -0.086
## factor(root_stress)2                   -0.59189    0.21506 29.53055  -2.752
## WordForminflected:factor(root_stress)2  0.06397    0.31146 29.37809   0.205
##                                        Pr(>|t|)    
## (Intercept)                            3.13e-09 ***
## WordForminflected                         0.932    
## factor(root_stress)2                      0.010 *  
## WordForminflected:factor(root_stress)2    0.839    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) WrdFrm fc(_)2
## WrdFrmnflct -0.669              
## fctr(rt_s)2 -0.597  0.435       
## WrdFrm:(_)2  0.409 -0.610 -0.684
model_d_int <- lmer(ratio_mean_int ~ WordForm*factor(root_stress) + (1|root) +(1|Speaker), data=cs_roots_balanced_fixed)
summary(model_d_int)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: ratio_mean_int ~ WordForm * factor(root_stress) + (1 | root) +  
##     (1 | Speaker)
##    Data: cs_roots_balanced_fixed
## 
## REML criterion at convergence: -111.3
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.57735 -0.53982  0.04363  0.34612  2.62401 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev.
##  root     (Intercept) 0.0049577 0.07041 
##  Speaker  (Intercept) 0.0005786 0.02405 
##  Residual             0.0029225 0.05406 
## Number of obs: 59, groups:  root, 35; Speaker, 4
## 
## Fixed effects:
##                                         Estimate Std. Error        df t value
## (Intercept)                             1.027666   0.027914 23.553046  36.816
## WordForminflected                      -0.009407   0.036020 32.782154  -0.261
## factor(root_stress)2                    0.025803   0.040324 31.097365   0.640
## WordForminflected:factor(root_stress)2 -0.002339   0.058338 30.841543  -0.040
##                                        Pr(>|t|)    
## (Intercept)                              <2e-16 ***
## WordForminflected                         0.796    
## factor(root_stress)2                      0.527    
## WordForminflected:factor(root_stress)2    0.968    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) WrdFrm fc(_)2
## WrdFrmnflct -0.629              
## fctr(rt_s)2 -0.567  0.431       
## WrdFrm:(_)2  0.386 -0.613 -0.679
model_d_f0 <- lmer(ratio_mean_f0 ~ WordForm*factor(root_stress) + (1|root) +(1|Speaker), data=cs_roots_balanced_fixed)
## boundary (singular) fit: see help('isSingular')
summary(model_d_f0)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: ratio_mean_f0 ~ WordForm * factor(root_stress) + (1 | root) +  
##     (1 | Speaker)
##    Data: cs_roots_balanced_fixed
## 
## REML criterion at convergence: -97.1
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.77533 -0.65592 -0.07482  0.65051  1.94617 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. 
##  root     (Intercept) 7.64e-12 2.764e-06
##  Speaker  (Intercept) 1.05e-03 3.240e-02
##  Residual             7.78e-03 8.821e-02
## Number of obs: 59, groups:  root, 35; Speaker, 4
## 
## Fixed effects:
##                                        Estimate Std. Error       df t value
## (Intercept)                             1.11553    0.02647  8.83079  42.150
## WordForminflected                      -0.03276    0.03092 52.37574  -1.059
## factor(root_stress)2                   -0.07224    0.03231 54.36482  -2.236
## WordForminflected:factor(root_stress)2 -0.03662    0.04667 52.64520  -0.785
##                                        Pr(>|t|)    
## (Intercept)                            1.73e-11 ***
## WordForminflected                        0.2943    
## factor(root_stress)2                     0.0295 *  
## WordForminflected:factor(root_stress)2   0.4362    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) WrdFrm fc(_)2
## WrdFrmnflct -0.526              
## fctr(rt_s)2 -0.523  0.419       
## WrdFrm:(_)2  0.347 -0.653 -0.663
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
# model_d_maxf0 <- lmer(ratio_max_fo ~ ShiftDirect*factor(root_stress) + (1|root) +(1|Speaker), data=cs_roots_balanced)
# summary(model_d_maxf0)

# Model predictions for mobile roots
# Out prediction is the stress shifts to the s3, therefore s1:s2 ratio should decrease


model_d_dur_m <- lmer(ratio_s1_s2_dur ~ WordForm*factor(root_stress) + (1|root) +(1|Speaker), data=cs_roots_balanced_mobile)
summary(model_d_dur_m)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: ratio_s1_s2_dur ~ WordForm * factor(root_stress) + (1 | root) +  
##     (1 | Speaker)
##    Data: cs_roots_balanced_mobile
## 
## REML criterion at convergence: 53.8
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.23976 -0.60094 -0.06697  0.31499  2.72281 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  root     (Intercept) 0.20966  0.4579  
##  Speaker  (Intercept) 0.01750  0.1323  
##  Residual             0.07155  0.2675  
## Number of obs: 59, groups:  root, 16; Speaker, 4
## 
## Fixed effects:
##                                        Estimate Std. Error      df t value
## (Intercept)                              1.1007     0.1927 12.8666   5.712
## WordForminflected                        0.1484     0.2574 11.0138   0.577
## factor(root_stress)2                    -0.3173     0.5102 10.6822  -0.622
## WordForminflected:factor(root_stress)2   0.2070     0.7220 10.7057   0.287
##                                        Pr(>|t|)    
## (Intercept)                            7.44e-05 ***
## WordForminflected                         0.576    
## factor(root_stress)2                      0.547    
## WordForminflected:factor(root_stress)2    0.780    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) WrdFrm fc(_)2
## WrdFrmnflct -0.661              
## fctr(rt_s)2 -0.333  0.250       
## WrdFrm:(_)2  0.236 -0.356 -0.707
model_d_int_m <- lmer(ratio_mean_int ~ WordForm*factor(root_stress) + (1|root) +(1|Speaker), data=cs_roots_balanced_mobile)
summary(model_d_int_m)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: ratio_mean_int ~ WordForm * factor(root_stress) + (1 | root) +  
##     (1 | Speaker)
##    Data: cs_roots_balanced_mobile
## 
## REML criterion at convergence: -153.4
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.66362 -0.56755 -0.04577  0.37121  2.70992 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev.
##  root     (Intercept) 0.0021722 0.04661 
##  Speaker  (Intercept) 0.0001511 0.01229 
##  Residual             0.0020940 0.04576 
## Number of obs: 59, groups:  root, 16; Speaker, 4
## 
## Fixed effects:
##                                        Estimate Std. Error       df t value
## (Intercept)                             1.09386    0.02073 12.85485  52.761
## WordForminflected                      -0.06060    0.02827 12.41125  -2.143
## factor(root_stress)2                   -0.05649    0.05557 11.81606  -1.017
## WordForminflected:factor(root_stress)2 -0.02180    0.07868 11.84631  -0.277
##                                        Pr(>|t|)    
## (Intercept)                              <2e-16 ***
## WordForminflected                        0.0525 .  
## factor(root_stress)2                     0.3297    
## WordForminflected:factor(root_stress)2   0.7865    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) WrdFrm fc(_)2
## WrdFrmnflct -0.669              
## fctr(rt_s)2 -0.340  0.250       
## WrdFrm:(_)2  0.240 -0.359 -0.706
model_d_f0_m <- lmer(ratio_mean_f0 ~ WordForm*factor(root_stress) + (1|root) +(1|Speaker), data=cs_roots_balanced_mobile)
## boundary (singular) fit: see help('isSingular')
summary(model_d_f0_m)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: ratio_mean_f0 ~ WordForm * factor(root_stress) + (1 | root) +  
##     (1 | Speaker)
##    Data: cs_roots_balanced_mobile
## 
## REML criterion at convergence: -60.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.7667 -0.6269  0.0946  0.4130  2.7693 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  root     (Intercept) 0.000000 0.00000 
##  Speaker  (Intercept) 0.002064 0.04543 
##  Residual             0.015615 0.12496 
## Number of obs: 59, groups:  root, 16; Speaker, 4
## 
## Fixed effects:
##                                        Estimate Std. Error       df t value
## (Intercept)                             1.05788    0.03343  6.05827  31.645
## WordForminflected                      -0.03095    0.03504 52.12024  -0.883
## factor(root_stress)2                    0.04436    0.06712 52.00717   0.661
## WordForminflected:factor(root_stress)2 -0.07679    0.09506 52.01201  -0.808
##                                        Pr(>|t|)    
## (Intercept)                            5.84e-08 ***
## WordForminflected                         0.381    
## factor(root_stress)2                      0.512    
## WordForminflected:factor(root_stress)2    0.423    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) WrdFrm fc(_)2
## WrdFrmnflct -0.514              
## fctr(rt_s)2 -0.268  0.256       
## WrdFrm:(_)2  0.189 -0.369 -0.706
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

OBSERVATION: Model_D

  • Model Predictions Summary: Fixed Roots
model_predictions = read_csv("/Users/aidyn/Downloads/Fixed_Roots_Model_Predictions.csv")
## Rows: 12 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Model, Dataset, Formula, Term, Interpretation
## dbl (2): Estimate, p-value
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
model_predictions 
  • Model Predictions Summary: Mobile Roots
model_predictions_mobile = read_csv("/Users/aidyn/Downloads/Mobile_Roots_Model_Predictions.csv")
## Rows: 12 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Model, Dataset, Formula, Term, Interpretation
## dbl (2): Estimate, p-value
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
model_predictions_mobile 

Summary

Feature Fixed Roots Mobile Roots
Duration Root stress affects s1/s2 ratio. Inflection has no effect. No significant effects.
Intensity No significant effects. Trend toward reduction in inflected forms. Inflection marginally lowers intensity (supports stress shift).
F0 Root stress affects pitch (stress = 2 lowers F0). No significant pitch effects.

Fixed roots maintain stress on the root, and this is reflected in duration and pitch.

Mobile roots show some acoustic evidence of stress shifting, especially in intensity, though effects are weak and not consistent across measures.

Inflection alone is not a strong predictor of stress shift, but in mobile roots, it may serve as a cue for reduced root prominence.

# Test H4:Stress follows Russian rules
# s1 and s2 would have significantly longer duration than s3. 
# dataset contains durations of all s1,s2, s3 for fixed and mobile roots

# model_cs_all_shift <- lmer(Duration_in_ms ~ ShiftDirect + SyllPos + Stress +  (1|Speaker), data = cs_all_syll_shift)
# summary(model_cs_all_shift)

7 Conclusion

These results indicate that stress in CS nouns remains on the Russian root, while final syllables exhibit Kazakh-style lengthening, supporting a hybrid prosodic pattern. This outcome suggests that bilinguals represent and coordinate multiple phonological systems even at the word level.

7.1 RSession info

## R version 4.4.2 (2024-10-31)
## Platform: x86_64-apple-darwin20
## Running under: macOS Ventura 13.7.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/Los_Angeles
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] lmerTest_3.1-3  lme4_1.1-36     Matrix_1.7-1    patchwork_1.3.0
##  [5] modelr_0.1.11   lubridate_1.9.3 forcats_1.0.0   stringr_1.5.1  
##  [9] dplyr_1.1.4     purrr_1.0.2     readr_2.1.5     tidyr_1.3.1    
## [13] tibble_3.2.1    ggplot2_3.5.1   tidyverse_2.0.0
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6        xfun_0.49           bslib_0.8.0        
##  [4] lattice_0.22-6      numDeriv_2016.8-1.1 tzdb_0.4.0         
##  [7] Rdpack_2.6.2        vctrs_0.6.5         tools_4.4.2        
## [10] generics_0.1.3      parallel_4.4.2      fansi_1.0.6        
## [13] pkgconfig_2.0.3     lifecycle_1.0.4     compiler_4.4.2     
## [16] farver_2.1.2        textshaping_0.4.0   munsell_0.5.1      
## [19] htmltools_0.5.8.1   sass_0.4.9          yaml_2.3.10        
## [22] pillar_1.9.0        nloptr_2.1.1        crayon_1.5.3       
## [25] jquerylib_0.1.4     MASS_7.3-61         cachem_1.1.0       
## [28] reformulas_0.4.0    boot_1.3-31         nlme_3.1-166       
## [31] tidyselect_1.2.1    digest_0.6.37       stringi_1.8.7      
## [34] labeling_0.4.3      splines_4.4.2       fastmap_1.2.0      
## [37] grid_4.4.2          colorspace_2.1-1    cli_3.6.3          
## [40] magrittr_2.0.3      utf8_1.2.4          broom_1.0.7        
## [43] withr_3.0.2         scales_1.3.0        backports_1.5.0    
## [46] bit64_4.5.2         timechange_0.3.0    rmarkdown_2.29     
## [49] bit_4.5.0           ragg_1.3.3          hms_1.1.3          
## [52] evaluate_1.0.1      knitr_1.49          rbibutils_2.3      
## [55] rlang_1.1.4         Rcpp_1.0.14         glue_1.8.0         
## [58] rstudioapi_0.17.1   vroom_1.6.5         minqa_1.2.8        
## [61] jsonlite_1.8.9      R6_2.5.1            systemfonts_1.1.0